Reimport of carbon from cytosolic and vacuolar sugar pools into the Calvin–Benson cycle explains photosynthesis labeling anomalies

Significance Photosynthesis metabolites are quickly labeled when 13CO2 is fed to leaves, but the time course of labeling reveals additional contributing processes involved in the metabolic dynamics of photosynthesis. The existence of three such processes is demonstrated, and a metabolic flux model is developed to explore and characterize them. The model is consistent with a slow return of carbon from cytosolic and vacuolar sugars into the Calvin–Benson cycle through the oxidative pentose phosphate pathway. Our results provide insight into how carbon assimilation is integrated into the metabolic network of photosynthetic cells with implications for global carbon fluxes.


T1. Derivation of Polyexponential Models from Analytical Solutions of Compartmental Models
In Figure S1, we show a simplified model of photosynthetic carbon assimilation with three compartments and rates V1-V5. Assuming first-order or pseudo-first-order kinetics: We define the differential operator D, where [F] is a stand-in for any compartment's concentration: Given definitions (E1-E5) and notation from E6, the rates of change of compartments X, Y, and Z are: Through a series of substitutions, it can be shown that this system of differential equations simplifies to a linear homogenous differential equation of the 3 rd order: Where the coefficients a, b, and c are combinations of rate constants such that: = ! + " + # + $ + % ( 11) = ( ! + " )( # + $ + % ) ( 12) = ! # % ( 13) Which are all constants. The general solution to a linear homogenous differential equation with constant coefficients is of the form: Where m is some constant. From E10 and E14, we get the characteristic polynomial: This cubic polynomial has three roots, including repeating and complex roots. Due to the linearity of the system, its general solution is a linear combination of its roots, such that: Solving this cubic polynomial for biochemically reasonable estimates of k1 through k5 results in three real and negatively valued roots, making the general result of an identical form as the triexponential decay models we fit our data to in this study. The analytical solutions to the differential equations or systems of differential equations describing single and two-compartment models, likewise, correspond to single exponential and biexponential functions, respectively.

Nonlinear regression and bootstrapping
Fitting of % 12 C remaining data to polyexponential models was performed in Python using the curve_fit() function implemented in the SciPy package (1). We performed all regressions 100 times with uniformly sampled initial parameter values and selected the fit with the lowest SSR for further analysis. N = 1000 bootstrap resampling with replacement was performed using functions from the Python package recombinator. Due to the time-course structure of the data, circular block bootstrapping was used to preserve some of the dependence structure between subsequent measurements (2). Bootstrap samples were fitted using the same general procedure as that used to generate the best-fit lines, with the exception that the initial guesses for the parameter values for the regression of the bootstrap samples were set to the best-fit parameter values. 95% confidence intervals for each parameter were derived by taking the 2.5 th and 97.5 th percentile values of the resulting distributions of all successful fits.

Data treatment for heteroskedastic residuals and outlier identification
Heteroskedastic residuals from our nonlinear regressions were corrected using a logit transformation (3). Specifically, we performed nonlinear regression on models of the form: This preserves the relationship between our response, independent variables, and estimated parameters, allowing for straightforward interpretation while substantially reining in the heteroskedasticity of the residuals.
Studentized residuals were calculated for all model fits and datapoints whose studentized residuals exceeded an absolute value of 3 were excluded (N = 5). Due to the substantial impure heteroskedasticity in the Model 1 fits studentized residuals greater than 3 in Model 1 fits were ignored for the purposes of outlier removal.

Model selection criteria
Extra-sum-of-squares: For each nested pair of models we calculated the probability that, given the null hypothesis that the simpler of the two models is true, we would see the observed improvement in model fit as measured by the sum-of-squared residuals (SSR) (4). We calculate an F statistic as follows: Where SSRsimple and SSRcomplex are the SSR values for the simpler and complex -i.e., fewer and more parameters -models, respectively, and DFsimple and DFcomplex are the degrees of freedom for the two models. The F statistic resulting from E19 was then compared to the F-distribution to derive a p-value representing the probability of observing this F statistic given our null hypothesis, which is that our simpler model is correct. For this study, we set = 0.05 and used the Holm-Bonferroni correction (5) to adjust our p-value cutoff to one that corresponds to a family-wise α of 0.05. For each p-value Pk in the family of hypothesis tests being tested, we evaluate the following expression: where is the family-wise we are adjusting to, m is the number of hypothesis tests being conducted, and k is the rank of the p-value Pk in a ranked list of increasing p-values.
We selected the best-supported model for a given dataset by starting with the single exponential model and adding more parameters until we got to a model comparison that did not meet our adjusted p-value cutoff, in which case we went with the simpler model in the comparison. In cases where there was a comparison of two more complex models than the one we arrived at using the method just described that yielded a low p-value, we calculated p-value for the F-statistic comparison between the more complex of those two and the accepted model. If we were justified in rejecting the null hypothesis that the simpler model is better in this case, we went with the more complex model.

Cross-validation:
For this study we used the cross_validate() function from the SciKitLearn package to perform between 5 and 10 iterations of 5-fold cross-validation on our datasets [ (6,7). The same nonlinear ordinary least squares fitting procedure used for our best-fit parameter estimation on the full datasets was used for our cross-validation, with the only difference being that the fitting was done 5 times with different randomly selected bins of data for training and testing, resulting in 5 estimates of prediction error for each alternative model at each iteration. After 5-10 iterations, we took all the negative mean squared error estimates for each model for a given metabolite or aggregated metabolite dataset and then calculated their mean value and 95% confidence interval (± 1.96 SE). The model with the lowest average error and whose 95% CI does not overlap with the next simplest model in terms of the number of fitted parameters was chosen as the best-performing model for each dataset.
AIC/BIC: For each best-fit of Models 1-7, the AIC (8) and BIC (9) were calculated as follows: where k is the number of estimated parameters in the model, and n is the sample size. The bestsupported model for each dataset was chosen by identifying the model with the lowest AIC/BIC value that is not within two absolute units of a simpler (i.e., fewer parameters) model.

T3. Calculation of vo/vc
We begin with the equation from (10) where A is the net rate of CO2 assimilation (uptake), vC is the velocity of carboxylation, vO is the velocity of oxygenation, and RL is all other sources of CO2 release in the light, possibly primarily CO2 released by the glucose 6-phosphate shunt (11). Next, we define and so We can also estimate vO.
and so

27)
Taking the ratio of equations and canceling (A+RL) We can expand Φ as (10) where Γ * is the CO2 compensation point in the absence of RL. Therefore, Where C is the CO2 partial pressure equivalent at the sites of carboxylation. This is determined by where Ci is the partial pressure of CO2 in the intercellular air spaces of the leaf (estimated from gas exchange) and gm is the mesophyll conductance for CO2 diffusion. In the absence of a direct measurement gm can be estimated as Based on multiple measurements reported in (12) We can parameterize as follows based on measured gas exchange of the leaves used for this data set  (11). Leaf temperature fell below 0°C between 100 and 500 ms depending on location withing the chamber. The frozen leaf sample was stored at -80°C. There were three biological replicates for data points from 0-90 min, and two biological replicates at 120 min. The fit for all the tested models were accepted based on χ2 test of the sum-of-squared residuals (SSR). Global best fit SSR were calculated by parameter continuation analysis. Fatty acid synthesis rate is constrained to 0.0329-0.4405 μmol CO2 g -1 FW h -1 by combining the previous measurements of 0.049-0.067 μmol CO2 m −2 s -1 with 0.005-0.012 μmol CO2 m −2 s -1 (11,16). RL is constrained to 8.1-10.7 μmol CO2 g -1 FW h -1 based on previous measurement (11). vo/vc is constrained to 0.3-0.32 based on measurement in this study. p13C is the measured 13 C enrichment; p12C is the measured 12 C enrichment; n is the number of 13 C carbon; m is the number of total carbons; mCn is the combination for choosing objects of n from the total number of objects of m.        Lowest value 50 th percentile highest value   Table S7. vo/vc for models with and without labeling input for serine and glycine, with and without constraints of vo/vc. Four scenarios were tested: 1) with serine and glycine labeling input, unconstrained vo/vc; 2) with serine and glycine labeling input, constrained vo/vc = 0.31 +/-5%; 3) without serine and glycine labeling input, unconstrained vo/vc; 4) without serine and glycine labeling input, constrained vo/vc = 0.31 +/-5% based on the variation in the gas exchange data used to determine vo/vc.  Table S8. Additional modeling to test (a) whether addition of reactions representing starch turnover to the metabolic model meaningfully improves the agreement between the measured and simulated labeling and other flux data; and (b) whether the fitting of such models to the data indicates a biologically significant flux through starch turnover.